Inherited defects of piRNA biogenesis cause transposon de-repression, impaired spermatogenesis, and human male infertility

piRNAs are crucial for transposon silencing, germ cell maturation, and fertility in male mice. Here, we report on the genetic landscape of piRNA dysfunction in humans and present 39 infertile men carrying biallelic variants in 14 different piRNA pathway genes, including PIWIL1, GTSF1, GPAT2, MAEL, TDRD1, and DDX4. In some affected men, the testicular phenotypes differ from those of the respective knockout mice and range from complete germ cell loss to the production of a few morphologically abnormal sperm. A reduced number of pachytene piRNAs was detected in the testicular tissue of variant carriers, demonstrating impaired piRNA biogenesis. Furthermore, LINE1 expression in spermatogonia links impaired piRNA biogenesis to transposon de-silencing and serves to classify variants as functionally relevant. These results establish the disrupted piRNA pathway as a major cause of human spermatogenic failure and provide insights into transposon silencing in human male germ cells.


Introduction
PIWI-interacting RNAs (piRNAs) represent a speci c type of regulatory, single-stranded small non-coding RNAs preferentially expressed in germ cells.They are required for transposon silencing, thus safeguarding genome integrity in the fetal and adult mammalian testis, and sculpting the post-meiotic transcriptome.In contrast to mice, in which disrupted piRNA biogenesis has been tightly linked to malespeci c infertility, the role of the piRNA pathway in spermatogenic failure in men remains largely unexplored.
piRNAs bind to a subclade of Argonaute proteins known as PIWI proteins (derived from P-elementinduced-wimpy testis) 1 .Based on their temporal expression in mice, piRNAs are classi ed into three distinct main categories: fetal, pre-pachytene, and pachytene piRNAs 2 .Fetal piRNAs, which are expressed in prospermatogonia, are loaded into both PIWIL2 and PIWIL4, while pre-pachytene piRNAs, expressed in postnatal spermatogonia, are exclusively associated with PIWIL2.Finally, pachytene piRNAs bound by PIWIL1 and PIWIL2 are rst produced when germ cells enter the pachytene stage of meiosis and account for approximately 95% of piRNAs in the adult testis 1 .
Pachytene piRNAs originate predominantly from non-transposable element (TE) intergenic loci and regulate gene expression by controlling transcriptional degradation or translational initiation at the late and post-meiotic stages of spermatogenesis [3][4][5][6] .On the contrary, fetal and pre-pachytene piRNAs are enriched in TE-targeting sequences and essential for their post-transcriptional silencing through the piRNA-induced silencing complex (piRISC) mediated slicer activity [7][8][9] .In addition, fetal piRNAs are required for de novo transposon methylation 10 .piRNA biogenesis can be differentiated into two pathways involving not only PIWI proteins but also Tudor domain-containing proteins (TDRDs) acting as scaffolds, along with several enzymes involved in pre-piRNA trimming and maturation 11 (Supplementary Fig. 1a,b).The biogenesis of pachytene piRNAs is restricted to the primary pathway, in which long piRNA precursors are transferred from the nucleus to the cytoplasm and accumulate in perinuclear structures called nuages.Here, mature piRNAs are produced through cleavage and processing of piRNA precursors.This cleavage is independent of piRISC activity and is initiated by the endonuclease PLD6, which establishes the 5'-ends of pre-piRNAs 12,13 .In contrast, in the fetal testis, complementary long piRNA precursors are mainly cleaved by PIWIL2-bound piRISC complexes.Here, the massive ampli cation of TE-derived fetal piRNAs is established in the secondary pathway (ping-pong cycle) 1 .
Knockout mice for more than twenty genes related to the piRNA-pathway have been analyzed 11 .These mice are concordantly affected by male-speci c infertility, small testes, and germ cell maturation arrest at meiosis or early haploid cell stages.Furthermore, a substantial reduction in the amount of piRNAs and consequent de-repression of TEs in the fetal and/or adult testis was observed in several mouse models [14][15][16][17] .
Here, we shed light on the impact of disrupted piRNA biogenesis on human spermatogenesis by presenting 39 infertile men carrying rare, biallelic, putative pathogenic variants in 14 different genes encoding proteins of the piRNA pathway.Interestingly, the observed testicular phenotypes repeatedly differ from those of the respective knockout mice.Furthermore, we show that the dysfunction of piRNA pathway proteins in the human adult testis not only leads to a reduced amount of pachytene piRNAs but is also associated with a gene-speci c increase of transposon expression in spermatogonia.These analyses can serve as readout for the functional relevance, i.e., pathogenicity, especially of the identi ed missense variants.

Genes encoding piRNA biogenesis proteins are frequently mutated in infertile men
To elucidate protein networks or biological pathways contributing to impaired spermatogenesis, we queried exome/genome data of > 2,000 infertile men from the Male Reproductive Genomics (MERGE) study 24 for rare homozygous loss-of-function (LoF) variants in genes preferentially expressed in the human testis.On the 61 identi ed genes, we performed a gene ontology-based two-tiered hierarchical clustering of signi cantly enriched biological processes that showed a striking enrichment of categories associate with "piRNA processing" (Fig. 1a, Supplementary Fig. 2).Further analysis revealed that piRNA pathway genes also contributed to the most signi cantly enriched individual processes (Fig. 1b,).Next, we screened the MERGE data from 2,127 infertile men with azoo-, cryptozoo-, extreme or severe oligozoospermia (< 2/<10 million total sperm count; Online methods) for biallelic, high-impact variants (minor allele frequency [MAF] in gnomAD < 0.01; LoF or missense variants with CADD score ≥ 15) in 24 human orthologues of murine genes associated with piRNA biogenesis (Fig. 1c; Supplementary Table 1) and identi ed 31 men carrying variants ful lling the selection criteria (Table 1).    1, also including reference transcripts).The three FKBP6 variant carriers and also one of the TDRD12 variant carriers have been described previously 21,23 .Detailed analysis of the exomes from each variant carrier did not reveal any other variants with a higher probability for causing the disease.In two cases, chromosomal translocations were identi ed (Extended data Table 2) and it cannot be excluded that they at least partially contribute to the patient's phenotype.
Of the 31 affected men, 19 were azoospermic, nine were cryptozoospermic, and four had extreme oligozoospermia.Twenty-three patients underwent a testicular biopsy with the aim of sperm extraction (TESE), which was negative in 22 men, i.e., no spermatozoa could be obtained.The analyses of testicular tissue revealed phenotypes ranging from complete absence of germ cells (Sertoli cell-only, SCO, N = 7), the presence of spermatocytes (meiotic arrest, MeiA, N = 6), round spermatids (RsA, N = 7) or elongated spermatids (ES+, N = 3) as the most advanced germ cells (Supplementary Fig. 3, Extended data Table 2).
In addition, screening of exome data from three independent cohorts of infertile men identi ed eight further patients with homozygous high-impact variants (two LoF and six missense) in GPAT2, PIWIL2, MOV10L1, and TDRD12 (Table 1, Extended data Table 1), bringing the total number of variant carriers to 39.In six of the additional patients, the testicular phenotypes of SCO or spermatogenic arrest con rmed the clinically suspected non-obstructive azoospermia (Extended data Table 2).In summary, 38 distinct variants in piRNA genes were identi ed among 39 infertile men.Of these variants, 12 were absent from gnomAD (version v2.1.1)and 18 were extremely rare (MAF ≤0.0001) (Extended data Table 1).
Variants in genes encoding components of the piRISC complex Pachytene piRNAs have been proposed to direct PIWIL1 and PIWIL2 to cleave speci c mRNAs and thereby regulate gene expression 25 .The slicing activity of this piRNA-induced silencing complex (piRISC) requires GTSF1 as an auxiliary factor 26 .We identi ed four azoospermic men with biallelic variants in genes encoding proteins essential for piRISC activity (Table 1).In PIWIL1, a homozygous stop-gain variant c.688C > T p.(Arg230*) localizing within the PAZ domain (Fig. 2a, Extended data Fig. 1a) was identi ed.The variant carrier exhibited CREM-positive, haploid, round spermatids as the most advanced germ cells in the seminiferous tubules (Fig. 2b) and TESE was negative (Extended data Table 2).Further staining demonstrated the absence of PIWIL1 in testicular germ cells, which is expressed in spermatocytes and haploid germ cells in a control with normal spermatogenesis (Fig. 2c).
Furthermore, two azoospermic men carried two different homozygous missense variants, in PIWIL2, c.1697G > A p.(Arg566His) and c.839A > C p.(Tyr280Ser), affecting amino acids conserved up to zebra sh (Extended data Fig. 1b).Arg566 is predicted to be a surface-accessible residue found within the linker 2 (L2) domain of PIWIL2 27 which bridges the PAZ and MID domains (Fig. 2a).Tyr280 is located within the structured N-terminal region of PIWIL2, which has been suggested to stabilize piRNA-target duplex conformations 28 .Finally, two men with meiotic arrest were carriers of homozygous, high-impact variants in GTSF1 (Fig. 2a, Extended data Fig. 1c).The frameshift variant c.221_222del p.(Arg74Lysfs*4) was predicted to result in nonsense-mediated decay (NMD), leading to abolished GTSF1 expression in the patient's spermatocytes (Fig. 2c).The missense variant c.97C > A p.(His33Asn) is located in a predicted α−helical protein domain and affects the second histidine of the highly conserved GTSF1 rst zinc nger motif (Extended data Fig. 1c).

Variants in genes involved in piRNA metabolic processes
In the primary mammalian piRNA pathway, RNA helicase MOV10L1 selectively binds to cytoplasmic piRNA precursor transcripts 29 and feeds them to the mitochondrial-associated endonuclease PLD6, which catalyzes the rst cleavage step of piRNA processing 30 .
PLD6 activity requires a GPAT protein to cleave pre-piRNAs 30 and in mammals, the mitochondrialassociated protein GPAT2 is crucial for primary piRNA biogenesis 31 .Biallelic variants in GPAT2 were detected in eight men with negative TESE attempt (Extended data Table 2; Fig. 3a; Extended data Fig. 3).
Five of these variant carriers share a severe testicular SCO phenotype, two had meiotic arrest, while only one presented with hypospermatogenesis leading to extreme oligozoospermia ( Fig. 3b).The homozygous missense variant c.1879C > T p.(Arg627Trp) was identi ed in the unrelated patients M690 and M1844, and both parents of M1844 carried this variant in the heterozygous state (Extended data  3c,d).In M13, p.(His377Arg) was inherited from the mother, who was not carrier of the second variant p. (Arg652*) and p.(Arg49His) was also identi ed in MI-0042P in the homozygous state.Finally, the splice acceptor variant c.1156-1G > A (M2556) in GPAT2 resulted in skipping of exon 12, as con rmed by a minigene assay (Extended data Fig. 4a).Of note, GPAT2 expression was absent or strikingly reduced in the spermatocytes of the two GPAT2 variant carriers analyzed (Fig. 3c).
Furthermore, we identi ed three patients with homozygous missense variants in PNLDC1 (M3274, M1125) and HENMT1 (M3079) (Fig. 3a) which both play a crucial role in piRNA maturation 32,33 .The two PNLDC1 variant carriers exhibited cryptozoospermia and, ttingly, the testicular biopsy of M112 revealed only a few tubules with elongated spermatids while TESE was negative.(Extended data Table 2).The missense variants impact two highly conserved amino acid residues, both situated in the PNLDC1 CAF domain (Extended data Fig. 4b).The homozygous variant c.400A > T p.(Ile134Leu) in HENMT1 was identi ed in an azoospermic men with round spermatid arrest (Fig. 3b).The affected tyrosine residue is located in the protein's methyl-transferase domain and is conserved up to zebra sh (Extended data Fig. 4c).
Finally, we also identi ed biallelic variants in genes that are limited to secondary biogenesis or post piRNA maturation processes.Among these, DDX4 encodes a germ cell-speci c RNA helicase required for ribonucleoprotein remodeling during the loading of secondary piRNA intermediates onto PIWIL4 17 .In DDX4, the homozygous missense variant c.1532C > T p.(Ala511Val) was identi ed in an infertile men with cryptozoospermia due to predominant round spermatid arrest in the testicular tissue (M928, Fig. 3b).Alanine 511 is present in orthologous proteins up to fruit y and is located in a highly conserved core protein region between the two predicted helicase domains (Extended data Fig. 5a).However, the cellular expression pro le of DDX4 in the patient's testicular tissue remained unchanged (Supplementary Fig. 4a).
The piRNA pathway component MAEL localizes to the cytoplasm and shuttles to the nucleus in round spermatids 34 .It may also facilitate nucleo-cytoplasmic tra cking of PIWIL4-piRNA complexes 35 and pachytene piRNA intermediate processing 36 .M2435 carried the con rmed compound heterozygous stop-gain variant c.799C > T p.(Arg267*) and the splice site variant c.908 + 1G > C p.? in MAEL, which was shown to cause skipping of exon 9 (Extended data Fig. 5b), resulting in an in-frame deletion of 21 amino acids.No spermatozoa could be retrieved from the testicular biopsy (Extended data Table 2) showing pachytene spermatocytes and few CREM-positive haploid germ cells (Supplementary Fig. 3) in single tubules, indicating a spermatogenic arrest at stages downstream of pachytene or after completion of meiosis.Assumed degradation of mutant MAEL transcripts by NMD was supported by absence of MAEL-speci c staining in spermatocytes and round spermatids in patient testicular sections (Fig. 3c).

Variants in the scaffold proteins encoded by the TDRD gene family
Tudor domain (TD)-containing proteins (TDRDs) play a crucial role as molecular scaffolds in piRNA biogenesis 37 and, in mice, several members of the TDRD gene family have been linked to piRNA biogenesis.We identi ed rare homozygous variants in TDRD1, TDRD9, and TDRD12 that matched our ltering criteria.
The missense variant c.887C > A p.(Ser296Tyr) in TDRD1 in patient (M1648) with meiotic arrest affects a highly conserved serine residue located in the rst TD (Fig. 4A; Extended data Fig. 6a).In testicular tissue with complete spermatogenesis, TDRD1 is expressed in perinuclear structures within round spermatids (Supplementary Fig. 4b).Because the seminiferous tubules of M1648 lack haploid germ cells, it remains unknown whether p.(Ser296Tyr) has any effect on the expression or stability of TDRD1.
In TDRD9, we identi ed four infertile men with homozygous nucleotide substitutions predicted to affect the protein sequence: two homozygous LoF variants, c.3148dup p.(Val1050Glyfs*49) and c.3716 + 3A > G p.?, which causes skipping of TDRD9 exon 32 (Extended data Fig. 6c), resuling in a frameshift and two missense variants, p.(Val415Phe) and p.(Val1276Phe) (Table 1, Fig. 4a).The affected valine 415 is located in the helicase domain of TDRD9, whereas valine 1276 is located in a C-terminal protein region (Fig. 4a) and both amino acids are conserved in orthologous proteins (Extended data Fig. 6b).
Interestingly, haploid spermatozoa with impaired motility and abnormal morphology were observed in all four TDRD9 variant carriers (Table 1, Extended data Table 2).
Finally, in TDRD12, seven men with homozygous high-impact variants (Table 1, Fig. 4a) were identi ed out of whom ve had a negative TESE attempt.Two patients with SCO were carriers of homozygous missense variants c.287A > C p.(Asp96Ala) and c.593A > G p.(Asn198Ser), respectively, co-segregating with the disease in the families (Extended data Fig. 7a,b).TP5 with meiotic arrest carried the homozygous splice site variant c.963 + 1G > T p.?, leading to skipping of exon 9 (Extended data Fig. 7c).As a result the open reading frame is shifted resulting in the synthesis of a truncated protein, p.
(Asp289Alafs*3) if the mutant transcript is not degraded by NMD.
Interestingly, in the four other TDRD12 variant carriers, haploid germ cells (round or elongated spermatids) were detected in the seminiferous tubules (Fig. 4b, Supplementary Fig. 3) or spermatozoa were found in the ejaculate.Two of these patients were carriers of homozygous LoF variants, the stop gain variant c.986G > A p.(Trp329*) and the frameshift variant c.3157del p.(Leu1053Phefs*4) (Extended data Fig. 7d,e).The other two TDRD12 variants result in the substitution of highly conserved arginine residues Arg807 and Arg811 (Extended data Fig. 7f) which are found on the surface of a conserved and globular structured region of TDRD12 with unknown molecular function.

Impact of identi ed variants on expression of piRNA pathway components
Furthermore, we aimed to explore whether the loss-of-function of one piRNA biogenesis related protein might in uence the expression pro le of further proteins involved in this metabolic process.To this end, we performed immunohistochemical staining for several key piRNA pathway-related proteins (Fig. 5a,b, Extended data Fig. 8, Supplementary Fig. 4-7).Indeed, a diminished expression of PIWIL1, PLD6, GPAT2, and HENMT1 in spermatocytes was observed in several variant carriers.TDRD1-speci c staining was absent only in round spermatids of the PIWIL1 stop-gain variant carrier M2006, and characteristic and distinct staining of MAEL-positive structures was more diffuse in spermatocytes of the TDRD1 and GPAT2 variant carriers.Of note, the staining pattern of DDX4 and GTSF1 was not affected in any of the variant carriers analyzed.Collectively, these observations indicate a gene-/protein-speci c impact of several of the piRNA biogenesis proteins on the expression of additional piRNA factors.

Impact of identi ed variants on piRNA processing and transposon silencing
In the patients with available snap frozen testicular tissue and who were not affected by total germ cell loss [M2006: PIWIL1 p.(Arg230*), M1648: TDRD1 p.(Ser296Tyr), M2595: TDRD12 p.(Leu1053Phefs*4), M2317, TDRD12 p.(Arg811Gln)], we analyzed the impact on piRNA biogenesis in germ cells and performed small-RNA sequencing.The mapped piRNA sequences were intersected with known pachytene piRNA loci detected in the human adult testis.This revealed signi cantly decreased amounts of piRNAs in all four patients, compared with tissue with complete spermatogenesis (Fig. 5c).Notably, the peak of piRNAs, with a length of 28-31 bases seen in the control tissue was absent in all four samples.Accordingly, approximately 50% of all piRNAs in the mutant samples had a length between 25 and 27 bases, which was the case for only 13% of piRNAs in the control tissue (Supplementary Fig. 8a).
In mice, disruption of piRNA biogenesis leads to upregulation of transposons.We, therefore, investigated the silencing of transposons in human male germ cells and performed immunohistochemical staining for LINE1 open reading frame 1 protein (LINE1 ORFp1) in the testicular sections of variant carriers.Using a monoclonal, validated antibody directed against human LINE1 ORFp1, no staining was detected in germ cells of human control samples with complete spermatogenesis.In contrast, a concordant and speci c expression of LINE1 ORFp1 in spermatogonia (Fig. 5d) was observed in three TDRD12, two GPAT2, three TDRD12, and single MAEL and HENMT1 variant carriers, while in all other cases, including carriers of homozygous LoF variants in PIWIL1 and GTSF1, no LINE1 ORFp1 was expressed.(Fig. 5d, Supplementary Fig. 8b).
In summary, in 13 variant carriers (9 biallelic LoF, 4 homozygous missense) functional data on piRNA factor expression, pachytene piRNA level and/or TE expression demonstrated the pathogenicity of the respective variants.

Discussion
The number of identi ed monogenic causes of male infertility due to impaired spermatogenesis is steadily increasing, and a striking number of described disease genes encode proteins with vital roles in meiosis 38 .Through comprehensive exploration of biallelic variants in exome/genome data of > 2,000 infertile men, we provide evidence that beyond meiosis-related genes, genes encoding proteins involved in piRNA biogenesis are a major, previously underexplored contributor to human spermatogenic failure.
Biallelic LoF variants in 14 piRNA genes, including PIWIL1, GTSF1, PLD6, GPAT2, and MAEL, were identi ed in infertile men for the rst time, establishing them as novel disease genes.Furthermore, no homozygous potentially pathogenic missense variants had yet been reported for TDRD1 and DDX4, both of which are highly intolerant to genetic variation.Of note, the presented homozygous stop-gain variant in PIWIL1 resolves the controversy regarding the causal link between heterozygous human PIWIL1 variants and azoospermia 39,40 and renders this another autosomal recessive disease gene.
Among the genes highlighted in this study, FKBP6 23 , PIWIL2 20,41 , PNLDC1 21,22 , PLD6 21 , HENMT1 18 , MOV10L1 42 , TDRD9 18,19,21 , and TDRD12 21 were recently described in the context of piRNA pathway dysfunction and/or human male infertility and the discovery of additional variants signi cantly strengthens the gene-disease relationship.For several of these genes, we identi ed similar testicular phenotypes as previously reported: SCO in PIWIL2 41 , round spermatid arrest in PNLDC1 22 , and hypospermatogenesis in TDRD9 variant carriers 18 , respectively.Interestingly, also the gene-speci c patients' phenotypes observed in this study were largely consistent and severity was independent from the type of variant, indicating that the identi ed missense variants are indeed also loss-of-function variants on the protein level (Fig. 6a).Thus, the gene-speci c testicular phenotype can be used to aid assessment of the variant's pathogenicity.In contrast, TDRD12 variant carriers exhibited highly variable phenotypes, ranging from SCO to even a few spermatozoa in the ejaculate.In summary, with at least four different biallelic variants (including several LoF variants) identi ed per gene in this and other studies, GPAT2, PNLDC1, TDRD12, MOV10L1, PLD6, FKBP6, and TDRD9 are excellent candidates to be included in the diagnostic workup of infertile men.
This study also allows a comprehensive comparison of reproductive phenotypes per gene between mice and men, which reveals only a partial gene-speci c overlap (Fig. 6b).The phenotypic spectrum in humans is signi cantly broader, ranging from complete absence of spermatozoa to hypospermatogenesis resulting in severe oligozoospermia, compared with mice, in which round spermatid or meiotic arrest have predominantly been described.More speci cally, the round spermatid arrest observed in Miwi (Piwil1) knockout mice 43 was also present in the PIWIL1 variant carrier.In contrast, SCO was the predominant testicular phenotype in human variant carriers in PIWIL2, PLD6, and GPAT2, whereas the corresponding knockout mice exhibited meiotic arrest 12,14,44 .Only for the aged PIWIL2 knockout mice, 50% of tubuli revealed loss of germ cells 45 .Of note, the more severe phenotype observed in the human variant carriers was not associated with patient's age.In addition, carriers of FKBP6 or TDRD9 presented with round spermatid arrest or even hypospermatogenesis resulting in severe oligozoospermia, while the respective knockout mice showed meiotic arrest 46,47 .Thus, in some instances, human spermatogenesis seems to be less stringently controlled and progresses further despite disrupted piRNA biogenesis.It remains to be determined whether the round spermatids/spermatozoa produced in some men are actually suitable for procreation.
For several of the identi ed variants, we demonstrate functional data linking impaired function or absence of the encoded protein to downstream cellular effects.The observed diminished expression of further piRNA factors in variant carriers was also described for PNLDC1 22 , where it has been linked to decreased expression of MYBL1, a testis-speci c transcription factor known to regulate expression of pachytene piRNAs as well as several piRNA factor genes in mice.
In some piRNA factor knockout mice, impaired piRNA biogenesis resulted in de-repression of TEs in spermatocytes of the adult testis 9 .Surprisingly, we observed a spermatogonia-speci c de-repression of LINE transposons in GPAT2, TDRD12, FKBP6, HENMT1, and MAEL variant carriers.On the contrary, homozygous LoF variant carriers in PIWIL1 and GTSF1 lacking the encoded protein, did not demonstrate TE de-repression.Thus, de-repression of transposons does not seem to be a general consequence of impaired pachytene piRNA biogenesis.This is supported by the fact that this subtype of piRNAs, which comprises over 95% of all piRNAs in the adult mouse testis 1 , is depleted of transposon sequences 48 and has predominantly been linked to spermatogenic gene expression 5,49 .Further, Miwi (PIWIL1) expression is restricted to the adult testis 43 and, therefore, supposed to be essential only for pachytene piRNAmediated functions 5 .However, there are some human piRNA genes such as GPAT2, FKBP6, TDRD12 that, when impaired, concordantly result in LINE1 de-repression in spermatogonia and the encoded proteins might also be involved in biogenesis of pre-pachytene piRNAs.These piRNAs are bound predominantly by the slicer PIWIL2, which is expressed at all stages of male germ cell maturation, and are enriched in TEs 8 and 3'-UTR sequences 50 and are likely involved in the control of harmful transposon expression.
In conclusion, this study provides extensive data linking disrupted piRNA biogenesis to human spermatogenic failure, demonstrates that piRNA pathway genes are a major target for scrutinizing genetic causes of male infertility, and suggests that safeguarding of the genome during spermatogenesis is less stringent in men than in mice.The detailed characterization of pathogenic human variants provides insight into the molecular function of the factors involved in piRNA biogenesis and piRNA-mediated transposon silencing.This opens the possibility to investigate key protein domains and, in parallel, to assess the pathogenicity of gene variants.

Study cohorts
Four cohorts of whole exome (WES) or whole genome sequencing data (WGS) of infertile men were included in this study.The MERGE cohort includes data of 2,412 men (2,352 with WES and 60 with WGS) with various infertility phenotypes and >90% of these men were recruited at the Centre of Reproductive Medicine and Andrology (CeRA), Münster.Most men of this cohort are azoospermic (N = 1,448) or have severely reduced sperm counts: N = 454 with cryptozoospermia (sperm only identi ed after centrifugation of the ejaculate); N = 158 with extreme oligozoospermia (sperm count <2 million); N = 67 with severe oligozoospermic (sperm count <10 million).Numerical chromosomal aberrations such as Klinefelter syndrome (karyotype 47,XXY) and Y-chromosomal AZF-deletions are exclusion criteria.Likely pathogenic monogenic causes for the infertile phenotype were already described in about 8% of cases 24 .
Exome sequencing and bioinformatics analyses in the MERGE cohort were performed as previously described 24 .

Variant calling
After trimming of remaining adapter sequences and primers with Cutadapt v1.15 53 , reads were aligned against GRCh37.p13using BWA Mem v0.7.17 54 .Base quality recalibration and variant calling were performed using the GATK toolkit v3.8 55 with haplotype caller according to the best practice recommendations.For more recent samples and whole genome samples Illumina Dragen Bio-IT platform v4.2 was used for alignment and variant calling.Both pipelines use GRCh37.7.p13 as reference genome.Resulting variants were annotated with Ensembl Variant Effect Predictor 56 .

Screening of WES data for biallelic high impact variants
To identify potentially harmful gene variants in genes of the piRNA pathway WES of infertile from all cohorts included in this study were screened to identify individuals with biallelic high-impact variants (stop-gain, start-loss, splice site and splice region, deletions and insertions as well as missense variants with CADD ≥15) in a total of 24 different genes of the pathway (Table S1).Only variants with a MAF £0.01 (gnomAD database, v 2.1.1)were taken into account.
To exclude presence of additional possibly pathogenic variants WES data were additionally screened for additional rare homozygous high-impact variants (LoF and missense variants with CADD ≥20) occurring in a list of 21 azoospermia-associated genes with at least moderate clinical validity 24 and 363 candidate genes associated with the Gene Ontology classi cation "male infertility" in the Mouse Genome Informatics Database (MGI) revealing strong expression in human male germ cells.Patients in which additional candidate variants were identi ed were excluded from further analysis.

Further genetic analysis
Validation of variants as well as co-segregation analyses were performed by Sanger sequencing.The regions of interest were ampli ed from patients genomic DNA as well as available family members with the use of primers and conditions as listed in Table S2.The PCR products were then puri ed and sequenced using standard protocols.For validation of variants in GPAT2 long range PCR products using GPAT2 speci c primers that do not bind to pseudogenes GPAT2P1 and GPAT2P2 were ampli ed and used as template for subsequent nested PCR and Sanger sequencing.If a variant was found in more than one individual in MERGE (GPAT2: c.1879C>T in M690 and M1844 and c.1130A>G in M13 and M454) the relationship between the two mutation carriers was determined using the Somalier tool 64 .In case that no parental DNA was available for analysis,biallelic occurrence of heterozygous variants was determined by long-read sequencing using long-range PCR products encompassing both genomic regions of interest ampli ed from variant carriers as template for library generation.

NGS library preparation and long read sequencing using the MinION system
To determine if two heterozygous variants identi ed in one gene of the same patient occur in cis or in trans, a long read sequencing approach was used.To this end, a long-range PCR product encompassing both variants was ampli ed (see Supplemental Table 2 for primer information) from patients genomic DNA using the TAKARA LA Taq® DNA Polymerase Hot-Start Version. 1 µg of PCR products was used for subsequent preparation of MinION sequencing library.Barcoding and sequencing was carried out according to manufacturer's instructions (MinION, Oxford Nanopore Technologies).After demultiplexing of obtained reads, alignment to human reference hg19, quality control and variant calling phasing of variants on same/different alleles was determined.

Minigene assay
To determine the functional impact of splice site region variants, an in vitro splicing assay based on a minigene construct was performed.

Histology and Immunohistochemical staining
Testis biopsies of patients from the MERGE cohort and control subjects were obtained from testicular sperm extraction (TESE) approaches or histological evaluation at the Department of Clinical and Surgical Andrology (University Hospital Münster, Germany) and research purposes.Biopsies were xed in Bouin's solution overnight, washed with 70% ethanol and embedded in para n for routine histological evaluation.Subsequently, 5 µm sections were stained with periodic acid-Schiff (PAS) according to previously published protocols 68 .Testis biopsies of patients from Giessen were processed equally but stained with hematoxylin and eosin (HE) following previously published protocols 69 .Testis biopsies of patients from BCN were treated as described previously 51 .
For immunohistochemical analyses, 3 µm sections of testicular tissue were de-para nized and rehydrated as described previously 70 .After rinsing with tap water (15 min, RT) heat-induced antigen retrieval was performed in HIER buffer (pH 6) or as indicated in table S3.This step was followed by cooling and washing with 1X Tris-buffered saline (TBS) before endogenous peroxidase activity was blocked using 3% hydrogen peroxide (15 min, RT).Blocking was performed by adding 25% goat serum (#ab7481, abcam) in TBS containing 0.5% bovine serum albumin (BSA, #A9647, Merck, 30 min, RT).
Sections were incubated overnight at 4°C in primary antibody solution, including 5% BSA/TBS and primary antibody as indicated in table S3.The following day, sections were washed with 1x TBS and incubated with a corresponding biotinylated secondary antibody in 5% BSA/TBS for 1h.After washing with TBS, sections were incubated with streptavidin-horseradish peroxidase (Merck -1:500, 45 min, RT) diluted in 5% BSA/TBS.Subsequently, sections were washed with TBS and incubated with 3,3'-Diaminobenzidine tetrahydrochloride (DAB, Merck) for visualisation of antibody binding.Staining was validated by microscopical acquisition and stopped with aqua bidest.Counterstaining was conducted using Mayer's hematoxylin (Merck).Finally, sections were rinsed with tap water, dehydrated with increasing ethanol concentrations and mounted using M-GLAS® mounting medium (Merck).
Slides were evaluated and documented using a PreciPoint O8 with Bowtie (v.#1.0.1) 71 allowing only perfect matches, discarded miRNAs by selecting reads between 25 and 45 bases, and re-aligned to GRCh37 allowing one mismatch.Finally, known small non-coding RNAs, other than piRNAs, were removed from the dataset using DASHRv2 (v.#v2) 72 and the remaining piRNA sequences were intersected with known piRNA loci detected in human adult testis 3 .For statistical analysis data from small RNA-seq experiments were evaluated using SciPy (ver.: 1.8.0) packages 73 .Shapiro-Wilk test for normality of the data and Student's T-test for expression changes in piRNA quantities of different lengths were used.Supplementary Files

Fig.
Fig.3a).Patients 17-051 and 15-0730 were carriers of c.1388C > T p.(Thr463Met) (Extended data Fig.3b).Both originate from Morocco, and a Somalier analysis indicated that they are distantly related.M13 and M454 were compound heterozygous for the missense variant c.1130A > G p.(His377Arg), which is located in the protein's GPAT/DHAPAT acetyltransferase domain and the stop-gain variant c.1954C > T p.(Arg652*) or the missense variant c.146G > A p.(Arg49His), respectively (Extended data Fig.3c,d).In M13, p.(His377Arg) was inherited from the mother, who was not carrier of the second variant p. (Arg652*) and p.(Arg49His) was also identi ed in MI-0042P in the homozygous state.Finally, the splice landscape of piRNA biogenesis-relate male infertility.a. Work ow of scrutinizing biological processes related to genetically determined reduced sperm count and male infertility by gene ontology (GO) analysis.Pie chart shows rst hierarchy of the two-tiered hierarchy.b.Side-ways bar chart showing -log10(P-value) of individual GO terms.Number of piRNA pathway factors associated with GO term shown to the right of each bar and GO terms associated exclusively with piRNA pathway factors highlighted in blue.c.Schematic overview on mammalian piRNA biogenesis related sub-pathways with proteins factors known to be involved from mice knockout studies.Proteins in which encoded biallelic high-impact variants were identi ed in infertile men are underlined.

Table 1
Biallelic high-impact variants identi ed in genes of the piRNA pathway in infertile men due to azoo-, crypto, or extreme oligozoospermia.

Table 1 ,
Fig. 1c, Extended data Table 21lready described23; already described21; these are also shaded in grey; a, positive TESE; Azoo, azoospermia; MeiA, meiotic (spermatocyte) arrest; RsA, round spermatid arrest; DNA was extracted from peripheral blood leukocytes via standard methods.For exome sequencing of the MERGE and Strasbourg cohort, the samples were prepared and enrichment was carried out according to the protocol of either Agilent's SureSelectQXT Target Enrichment for Illumina Multiplexed Sequencing Featuring Transposase-Based Library Prep Technology (Agilent) or Twist Bioscience's Twist Human Core Exome.To capture libraries, Agilent's SureSelect Human All Exon Kits V4, V5 and V6 or Twist Bioscience's Human Core Exome plus RefSeq spike-in and Exome 2.0 plus comprehensive spike-in were used.For whole genome sequencing of samples from the MERGE cohort sequencing libraries were prepared according to Illumina's DNA PCR-Free library kit.For multiplexed sequencing, the libraries were index tagged using appropriate pairs of index primers.Quantity and quality of the libraries were determined with the ThermoFisher Qubit, the Agilent TapeStation 2200 and TecanIn nite 200 Pro Microplate reader, respectively.Sequencing was performed on the Illumina HiSeq 4000 System, the Illumina HiSeqX System, the Illumina NextSeq 500 System, the Illumina NextSeq 550 Barcelona: (2014/04c); Newcastle: (Newcastle:REC ref. 18/NE/0089), Nijmegen: (NL50495.091.14 version 5.0).All experiments were performed in accordance with relevant guidelines and regulations.Whole exome and genome sequencing Genomic The region of interest was ampli ed from genomic DNA of the respective patient as well as of a human control sample by standard PCR procedures.To analyse splice effect of variants GPAT2 c.1156-1G>A, MAEL c.908+1G>C and TDRD9 c.3716+3A>G products were cloned into pENTR™/D-TOPO® according to manufacturer's instructions.The subsequent gateway cloning was performed using Gateway™ LR Clonase™ Enzyme Mix and pDESTsplice as destination vector (pDESTsplice was a gift from Stefan Stamm (Addgene plasmid #32484) 65 .To analyse TDRD12 c.963+1G>T ampli ed region encompassing exon 8-10 of TDRD12 was subcloned into pcDNA3.1 and for MOV10L1 c.2179+3A>G into pSPL3B.A transient transfection with mutant and wildtype Minigene constructs was performed using Human Embryonic Kidney cells (HEK293T Lenti-X, Clontech Laboratories, Inc.®).Total RNA was extracted using the RNeasy Plus Mini Kit (QIAGEN®) and reversetranscribed into cDNA with the ProtoScript® II First Strand cDNA Synthesis Kit (New England Biolabs GmbH®).Ampli cation of the region of interest was performed and RT-PCR products were separated on a 2% agarose gel, cut out, extracted and sequenced.